Query-based Biclustering using Formal Concept Analysis
نویسندگان
چکیده
Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate prior knowledge and expertise from the user. To this end, query-based biclustering algorithms that are recently developed in the context of microarray data utilize a set of seed genes provided by the user which are assumed to be tightly co-expressed or functionally related to prune the search space and guide the biclustering algorithm. In this paper, a novel QueryBased Bi-Clustering algorithm, QBBC, is proposed by a new formulation that combines the advantages of low-variance biclustering techniques and Formal Concept Analysis. We prove that statistical dispersion measures that are order-preserving induce an ordering on the set of biclusters in the data. In turn, this ordering is exploited to form query-based biclusters in an efficient manner. Our novel approach provides a mechanism to generalize query-based biclustering to sparse high-dimensional data such as information networks and bag of words. Moreover, the proposed framework performs a local approach to query-based biclustering as opposed to the global approaches that previous algorithms have employed. Experimental results indicate that this local approach often produces higher quality and precise biclusters compared to the state-of-the-art querybased methods. In addition, our results on the performance evaluation illustrate the efficiency and scalability of QBBC compared to full biclustering approaches and other existing query-based approaches.
منابع مشابه
Biclustering with Background Knowledge using Formal Concept Analysis
Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate background knowledge and expertise from the user. To this end, query-based biclustering algorithms have bee...
متن کاملMining Biclusters of Similar Values with Triadic Concept Analysis
Biclustering numerical data became a popular data-mining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address...
متن کاملEnumerating all maximal biclusters in numerical datasets
Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which can not find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general fa...
متن کاملFrom Triconcepts to Triclusters
A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y , describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all triclusters and triconcepts contained in triclusters of l...
متن کاملExtraction de biclusters à valeurs similaires avec l’analyse de concepts triadiques
Biclustering numerical data became a popular datamining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address ...
متن کامل